June 29, 2016

Today's Overview

  • The motivation
  • Describe the population
  • Review the methodology and evaluation
  • Implementation with Shiny R Markdown

Motivation

  1. Predict potential murderers in the current probation population
    • More dangerous than usual probationers
  2. A vision for using this information
    • Manageable cases
    • Individualized interventions

Probationer Population

  • Mostly male (90%)
  • Mostly not murderers (>99%), but dangerous
  • Previously were state prisoners that are released on probation

Where did this project start?

Fig. 1 from Berk's study

N = 30,000

Model 1

fit <- randomForest(Murder ~ Age + White + Male + Total_Pop + 
                        Black_Pop + Prop_Black + Income + 
                        Zip_Present + Gang + ViolentCase, 
                    data = train, 
                    importance = TRUE, 
                    ntree = 1500)

Model 2

fit2 <- randomForest(Murder ~ Age + Total_Pop + Black_Pop + 
                         Prop_Black + Income + Zip_Present + 
                         ViolentCase, 
                    data = train, 
                    importance = TRUE, 
                    ntree = 1500,
                    mtry = 2,
                    cutoff = c(0.65, 0.30),
                    sampsize = c("0" = 100, "1" = 34),
                    strata = as.factor(train$Murder),
                    keep.inbag = TRUE,
                    na.action = na.roughfix)

Ongoing Evaluation

  • Context, context, context
  • False negatives are to be avoided

Shiny R Markdown Implementation

2,300 early releases from probation

Algorithms in the news

References

  • Berk, R., Sherman, L., Barnes, G., Kurtz, E., & Ahlman, L. (2009). Forecasting murder within a population of probationers and parolees: a high stakes application of statistical learning. Journal of the Royal Statistical Society: Series A (Statistics in Society), 172(1), 191-211.
  • Xavier Robin, Natacha Turck, Alexandre Hainard, Natalia Tiberti, Frédérique Lisacek, Jean-Charles Sanchez and Markus Müller (2011). pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinformatics, 12, p. 77. DOI: 10.1186/1471-2105-12-77.

Appendix A

Murderers by Location in LA County

Appendix B

Near Zero Variance

##                 freqRatio percentUnique zeroVar   nzv
## Murder         129.421875    0.01198035   FALSE  TRUE
## Age              1.053819    0.38936145   FALSE FALSE
## White            4.662822    0.01198035   FALSE FALSE
## Male             8.717113    0.01198035   FALSE FALSE
## ZIP              1.273859    3.73187972   FALSE FALSE
## Total_Pop        1.273859    3.54618426   FALSE FALSE
## Black_Pop        1.273859    3.28860669   FALSE FALSE
## Prop_Black       1.273859    3.42638074   FALSE FALSE
## Income           1.273859    3.51623338   FALSE FALSE
## PRIMARY CHARGE   1.296763    2.59374626   FALSE FALSE
## Gang             1.883745    0.01198035   FALSE FALSE
## RegisterSO      50.684211    0.01198035   FALSE  TRUE
## ViolentCase     11.704718    0.01198035   FALSE FALSE
## WeaponCase     104.658228    0.01198035   FALSE  TRUE
## DrugCase       537.516129    0.01198035   FALSE  TRUE
## MH               3.310354    0.01198035   FALSE FALSE
## Zip_Present      7.159335    0.01198035   FALSE FALSE

Appendix C

Model 1 ROC

Appendix D

Model 2 ROC

Appendix E

Confusion Matrix

Model 1

##                
## pred            Non-Murderers Murderers
##   Non-Murderers          6602        50
##   Murderers                 3         1

Model 2

##               
## pred           Non-Murderer Murderer
##   Non-Murderer         5651       12
##   Murderer              954       39

Appendix F

Comparison to current model

  • Risk assessment tool has 43% accuracy, 32% false negatives